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Abstract 

We provide a direct, elementary proof for the existence of lim^o v \i where v\ is 
the value of A-discounted finite two-person zero-sum stochastic game. 

1 Introduction 

Two-person zero-sum stochastic games were introduced by Shapley [3]. They are 
described by a 5-tuple (U,l,J,q,g), where O is a finite set of states, X and J are 
finite sets of actions, g : f2 x X x J — > [0, 1] is the payoff, q : 0, x X x J — > A(O) the 
transition and, for any finite set X, A(X) denotes the set of probability distributions 
over X. The functions g and q are bilinear ly extended to x A(X) x A( l X). The 
stochastic game with initial state oj € Vt and discount factor A € (0, 1] is denoted 
by T\(uj) and is played as follows: at stage m > 1, knowing the current state cj m , 
the players choose actions (i m ,j m ) 6 Ix J; their choice produces a stage payoff 
g(uj m , i m ,jm) an d influences the transition: a new state w m +i is chosen according to 
the probability distribution q(-\u} m ,i m , j m ). At the end of the game, player 1 receives 
E m >i A(l - A) m ~ 1 g(o; m ,i m , j m ) from player 2. The game has a value v\(u), 

and v\ = {v\{<jj))u£Vl is the unique fixed point of the so-called Shapley operator [5], 
i.e. v\ = 3>(A, v\), where for all / G ]R n : 

$(A, /)(w) = val (Sjt)eA(x)xA(J) {A 5 (w,s,t) + (1 - X)E q{ . lLd ^ t) [f(u)}}. (1.1) 

The Shapley operator provides optimal stationary strategies for both players. In par- 
ticular, the result holds for any signalling structure on past actions. The existence 
of lim^o^A was established by Bewley and Kohlberg [I], using Tarski-Seidenberg 
elimination theorem. 

The purpose of this note is to provide a direct, self-contained proof for the 
existence of limA^o^A- The key idea is to represent the asymptotic behaviour 
of a sequence of strategies by a simpler object. Let (x,y) £ A(X)^ x A( l X) 
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be a pair of stationary strategies. Every time the state w 6 !) is reached the 
next state is distributed according to q(-\oj,x(uj),y(uj)) and the stage payoff is 
g(uj,x(uj),y(u})). Thus, the sequence of states (uj m ) rn is a Markov chain with tran- 
sition Q = (q(oj'\u},x(uj),y(uj))^ LU ^ LU i^ e Q2 and the stage payoffs can be described by a 
vector g = (g(oj , x{u) , y(w)) we Q . For any initial state lo, the expected payoff induced 
by (x,y) in Ta(w) is given by 

-y\(u,x,y) = ,^ *a(w,u/)#(u/), 

where t\(u,uj') = Ylm>l -^(1 — X) m ~ 1 Q m ~ 1 {^,^') is the mean A-discounted time 
spent in state uj' . 

A key observation, due to Solan [5], is that t\(uj,uj') can be written has 
a hitting time of an auxiliary Markov chain whose transitions are in the set 
{0, A, ((1 — X)Q(uj, ^'))(tj,w')en 2 }- Thus, using a classical result from Friedlin and 
Wentzell for finite Markov chains, one deduces that t\(u>,u>') is a rational fraction 
in the variables A and ((1 — A)Q(w, w'))( £JiaJ /) 6 Q2, and that both polynomials in the 
numerator and denominator have nonnegative coefficients and are of degree at most 
\Q\. For a fixed y, a similar assertion is obtained for 7a(w,x, y) as a function of the 
variables A and ((1 — X)x i (uj))^ UJ ^ e Q >< x- That is, j\(u},x,y) is a rational fraction in 
these variables. One can easily check that the monomials both in the numerator and 
denominator can then be written in the following form: 

C(l-A) fe A a Yl x\u) A( - u ^, (1.2) 
(uv)enxx 

where C > depends on (y,oj) but not on (x,X), a,b G {0, and A G 

{0,l} nxZ . 



1.1 The asymptotic payoff 

Consider now a sequence (X n ,x n ) n , where A n G (0, 1] is a discount factor and x n G 
A(X)^ is a stationary strategy, for all n G N. / y\ n (ta,x n ,y), as n tends to infinity, 
for a fixed stationary strategy y G A(J7") . 

Definition 1.1. A sequence (A n ,x n ) n in (0, 1] x A(I) n is regular i/linin^oo A n = 
and if for any two monomials of the form (jl.2p their ratio converges in [0, +oo] as 
n tends to infinity^ 

Regular sequences can be characterized by a vector. Indeed, introduce a finite 

set: 

M :={(A,a) | AG{-l,0,l} nxI , o€{-|n|,... J 0,...,|n|}}. 
The sequence (A n ,x n ) n is regular if for all (A, a) G M the following limit 

L[(\ n ,x n ) n ](A,a):= km \ a n TT (1-3) 

exists in [0, +oo]. The regularity of a sequence depends on the existence of finitely 
many limits. Thus, for any family (^a)a£(o,i] °f stationary strategies there exists 
(A n ) n such that (A n ,x A J n is regular. 

1 We use here the natural convention that g = 0° = 1 and 13 = 0, 0~' 3 = 4 = +oo, for all j3 > 0. 
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Proposition 1.1. Let y £ A(J')^ 1 and uj £ Q be fixed. For any regular sequence 
(X n ,x n ) n ; li m rwoo 7A n (w, x n , y) exists and depends only on the vector L[(X n ,x n ) n ]. 

Proof. Let (X n ,x n ) n be regular and let L = L[(X n ,x n ) n ]. We have already seen 
that the expected payoff induced by (x n ,y) in T\ n (uj) can be written as a rational 
fraction whose monomials are all of the form: 

m n := C(l - X n ) b X a n J] <H %i) . (1-4) 

that the ratio of any two monomials m n and m' n converges as n — > oo, and that 
the limit is determined by L (and the constants C,C > 0). Thus, one can use 
the vector L to define an order relation in the set of the monomials in j\ n (uj,x n ,y) 
as follows: m n -< m' n if and only if lim^^oo m n /m' n £ [0, +oo). The set is totally 
ordered. Dividing numerator and denominator by some maximal element m*, and 
taking n — > oo we obtain that: 

^J^ y) = T — (1 . 5) 

where M + := {(A, a) \ A £ {0, l} nxI , a £ {0, . . . , and where the constants 

C(A, a) and C'(A, a) are nonnegative for all (A, a) £ The maximality of m* 

ensures that L(j4 — ^4*, a — a*) £ [0, +oo), for all (A, a) £ A4 + and that not all are 
0. The result follows. □ □ 



1.2 Canonical strategies 

For any c = (c(uj,i)) and e = (e(w,i)) in M+ xJ , we define a family of stationary 
strategies as follows: 

X1M:= e£^^^' V ^ G ^ XX < VAG(0,1]. (1.6) 
Assume, in addition, that ^igx e(u!)=o c ( u '') = ^ ^ or au LlJ ' so * na * 

x' A (w) c(w,i)A eM , V(w,i)ellxl (1.7) 

The exponent determines the order of magnitude of the probability of playing the 
action i at state ui asymptotically; the coefficient c(cj,z) its intensity. 

Definition 1.2. A family of strategies (xa)ag(o,i] * s canonical if it is induced by 
some x = (c, e) in the following set: 

X = {(c,e)£(R* + xR + f* 1 \ Vuen, V c(oo,i) = l}. 

— e(ui,i)=0 

Note that the coefficients are taken strictly positive. 

For all (A, a) £ A4 and x = (c, e) £ X the following limit exists: 

L x (A,a) := lim X a JJ^ j x (u,) A M . (1.8) 



3 



Indeed, a direct consequence of (jl.7p is that: 

L x (A,a) = lim A a+E <^> A M e ^ TT c(w,i) A ^, 

where rj( wi ) c(w,i) A(w,i) > 0. Thus: 

f{0}, iff a + Y, {u ,i)A{u,i)e(u},i) > 0, 

ix(A, a) G < {+00}, iff a + £ (W)i) A(w, i)e(w, i) < 0, (1.9) 
[ (0, +00), iff a + A(w, i)e(w, i) = 0. 

Thus, for any x G X and any vanishing sequence (A n ) n of discount factors, the 
sequence (A n ,x^ n ) n is regular. Moreover, L x = L[(\ n ,X\ n ) n ] for any such sequence. 



2 Main results 

2.1 Representation of a regular sequence by a canonical 
strategy 

Fix some regular sequence (A n , x n ) n throughout this section and let L = L[(X n ,x n ) n ] G 
[0, +00] the vector defined in (jl.3j> . Notice that L has many elementary properties: 

(PI) L(0, 0) = 1 and, for all (A, a) / 0, L(A, a) = +00 if and only if L(-A, -a) = 0; 

(P2) For all /iGl, L(0,//) := lim^+oo A£ = /x > and L(0,/x) G (0, +00) O 
fi = 0. In particular, L(0, //) G {0, 1, +00} for all 

(P3) If L(A, a) < +00, L(fiA, //a) := lim^ A^ n (w ,i) = MA a)' 4 ; 

(P4) If L(A, a) < +00 and L(B, b) < +00, then L(A + B,a + b) = L(A, a)L(B, b). 

Proposition 2.1. There exists x G X such that L x = L. 

Proof. Note that FT,*) c(w, i) A ( w >0 > for any A G {-1,0, l} Qx/ . Thus, from 
(jl.9p and (PI) one deduces the following necessary and sufficient conditions on the 
coefficients and the exponents (c, e) of x for having L x = L: 

V A(oj,i)e(uj,i) + a > 0, V(A, a) e M s.t. L(A, a) = 0, (2.1) 

V A(cj,i)e(w,i) + a = 0, V(A, a) G X s.t. L(A, a) G (0, +00), (2.2) 
— '(<*W 

nc(w, i) A(aJ,i) = L(A, a), V(A, a) G s.t. L(A, a) G (0, +00). (2.3) 
(u>,i) 

Notation: Let £ := {(A, a) G X | L(A, a) = 0} and £ + := {(A, a) G 

I L(A, a) G (0, +00)}. Put C := C U £+. 
Solving for the exponents. Let us prove that the system (|2. 1[) - (|2.2[) has a solu- 
tion. One and only one of the systems ([23 ]) -([23 ]) and (|23]) - (|23]) - (|2TB]l is consistent 
(see Mertens, Sorin and Zamir [3], part A, page 28): 

^ (A(i)g ^(i,aM = 0, /U, £o >0, (2.4) 

-V u(A,a)a>0, (2.5) 

"Em v r ^.a)a + ^,. . . MA,a)>0, (2.6) 
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Let us prove that the system (I2.4p - (l2.5p - ([2.6p . with unknowns fj, = (fi(A,a))^ a \ G 
R , is inconsistent. In (|2.4p . fj,\£ Q := (fJ-(A, a))(A,a)e£o denotes the restriction of fi to 
C . Assume (H3J). On the one hand, by (P3)-(P4), for all fj, G R c : 

(A,a)e£+ v 7 

On the other hand, by (P3)-(P4), for all [i G IR £ such that /zi£ > one has: 



n,. t(i '^ u („L*' w I,.**) = {o IT:; 0, 

(2.8) 



(A,a)e£ (A,a)G/:o (A,a)e£ 

Multiplying ([277]) and ([27S]) yields, by assumption ([274]) : 

Lfo,V MAa)alJ (0 ' +OO) if ^o= ' (2 . 9) 

V '^ ( A,a)e£ PV ; J |^ otherwise. V ' 

By (-P2), the first case implies Y2(A a)eC mC^j °) a = 0' which contradicts (|2.6p . and 
the second case implies E(4 a)e£^(^! a ) a > 0' which contradicts (|2.5j) . The system 
([274 > ([27 ^ -([27o ]) being inconsistent, the existence of a solution to ([271 ]) -([272 )1 in M nx:r 
follows. The boundedness of x n (u), i) implies that L((0, . . . , l^ 1 ^, . . . , 0), 0) < +oo, 
so that e(u>,i) > by ([27TJ and ([272)1 . 

Solving for the coefficients. Taking the logarithm in (j2.3|) yields: 



V A(w,i) In c(w,t) = ln(LM,a)), V(A,a)e£+, (2.10) 
— 

which is a linear system in d = (lnc(w, € R^ x2: . As before, one and only one 

of the systems (|2.10p and (|2.1ip is consistent: 

E (A>a)e£+ V(A a) A = 0, £ (Aa)e£+ a) ln(L(A, a)) > 0. (2.11) 

Let us prove that the system (|2.1ip . with unkowns fi = (/^(A a ))(A,a) G R^+, is 
inconsistent. Suppose that a )ec + A*(A, a)A = 0. Then, by (P3)-(P4): 

IU, ^-W^' = £ "<Aa)«) € (0,+oc). 

By (-P2), this implies ^TJ(A,a)e£ + MA a ) a = ^ an d, a fortiori, II(A,a)e£ + 
1, so that (12. lip fails. Consequently, there exists c = (exp(d(w, i)) G (R^_)^ x/ sat- 
isfying ([273]). □ 

2.2 Convergence of the discounted values 

Theorem 2.1. The limit of (v\)\, as A tends to 0, exists. Moreover, there exists 
x G X such that (x A ) A is asymptotically optimal, i.e. for all e > 0, there exists 
Ao G (0, 1] such that: 

7A (w,x A ,y) > lim A ^ fAH -£, G ft, Vy G A(J) n , VA G (0,A ). 
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Proof. Let w £ O be fixed. Let (x A ) A >o be a family of optimal stationary 
strategies in (r\(w))_\>o and let (A n ) n be a sequence of discount factors such 
that limn^oo v x„ (w) = limsup^Q V\(lj). The optimality of x\ n implies that 
7A n (^ 5 x\ n , j) > v\ n (ui), for all j G . Indeed, against a stationary strategy of 
player 1, player 2 faces a Markov decision process. Thus, player 2 has a pure sta- 
tionary best reply. Up to some subsequence, (A n ,x An ) n is regular. By Proposition 
I2.1[ there exists x G X such that L x = L[(\ n ,x\ n ) n ]. Thus, by Proposition 11.11 

lim n _^ 00 7 An (a;,XA n , j) = lim n _ >00 7 An (w,x An ,j), Vj G J n . 

On the other hand, the limit lim A _s.o7A( w ; x A 5 j) exists. Consequently: 

lim A ^o7A(^,x A , j) = lim n ^ oc ,7 An (a;,x Aii , j) > lirnsup A _>. v\(u), Vj G J n . 

(2.12) 

It follows that for all e > there exists Ao G (0, 1] such that: 

min 7 A (w,x A , j) > limsup A ^ v A (w) - e, VA G (0, A ). (2.13) 

jej n 

The latter implies that v\(u) > limsup A ^, w A (w) — e, for all A G (0, Ao), and the 
existence of lim A _>.o v\ follows by taking the lim inf. The canonical strategy x has 
the desired property. □ 

2.3 Concluding remarks 

(1) Consider an infinitely repeated stochastic game where the past actions are 
observed. The existence of the uniform value is due to Mertens and Neyman 
[2] and relies on the following result: 

Theorem 2.2. Let f : (0, 1) — > M be a function such that: 

(a) \\fx — fy\\ < tp(x)dx, for all < A < A' < 1 and for some ip G 
L 1 ((0,1],M+); 

(b) There exists A > such that $(A, fx) > fx, for all A G (0, Ao) J! 
Then, player 1 can guarantee lim A ^o fx * n ^oo ■ 

One can use Theorem 12. II to prove the existence of the uniform value. Indeed, 
for any x G A(X) , uj G £1 and A G (0, 1], let w®(uj) := min je jn 7 A (oj,rc, j) be 
the payoff guaranteed by x in Tx(oj). One can check that iu? < <E>(A, u??), for 
all A G (0, 1]. Besides, for any x G X, the functions (A i— > w* x (w)) aJ gQ are of 
bounded variation, so that player 1 can guarantee lining f° r an Y x G X 
by Theorem 12.21 In particular, if (x A ) A is asymptotically optimal, player 1 can 
guarantee lim A ^o v x ■ 

(2) The existence of an x G X such that (x A ) A is asymptotically optimal was 
already noticed by Solan and Vieille [6j. The result was deduced from the 
semi-algebraicity of A i— > V\, obtained in [T] using Tarski-Seidenberg elimination 
theorem. 

2 $ is the Shapley operator, defined in 
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(3) In the system (|2.ip - (|2.2p for the exponents (first part of the proof of Proposition 
12. lj) note that all the entries of A are in { — 1,0, 1}. This implies the existence 
of a solution having all its coordinates in {0, 1/iV, 2/iV, . . . }, for some N < 
|ft||Z|v^ffl. 

(4) Our approach fails without the finiteness assumption on I, J and Q. A recent 
example where I and J are compact, q is continuous, g is independent of the 
actions and the family (v\)\ does not converge is due to Vigeral [7]. 
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